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ABSTRACT 

This  paper  ^eeks  to  discover  whether  the  known  inaccuracy  of  informant 
recall  about  their  communication  behavior  can  be  accounted  for  by 
experimentally  varying  the  time  period  over  which  recall  takes  place.  The 
experiment  took  advantage  of  a  new  communications  medium  (computer 
conferencing)  which  enabled  us  to  monitor  automatically  all  the  inter¬ 
actions  involving  a  subset  of  the  computer  network.  The  experiment  itself 
was  administered  entirely  by  the  computer,  which  interviewed  informants 
and  recorded  their  responses. 

Variations  in  time  period  failed  to  account  for  much  of  the  inaccuracy, 
which  continues,  as  in  previous  experiments  at  an  unacceptably  high  level.' 
One  positive  finding  did  emerge:  although  people  do  not  know  with  whom 
they  communicate,  people  en  masse  seem  to  know  certain  broad  facts  about 
the  communication  pattern.  All  other  findings  were  negative.  For  example, 
it  is  impossible  to  predict  the  people  an  informant  claimed  to  communicate 
with  but  did  not;  and  it  is  impossible  to  predict  who  the  five  people  are 
that  an  informant  forgot  to  mention  that  she  or  he  had  had  communication 
with. 

Thus,  despite  their  presumed  good  intentions,  what  people  say  about 
their  communications  bears  no  resemblance  to  their  behavior.  This 
immediately  makes  suspect  all  forms  of  data  gathering,  based  on  questions 
which  require  that  informants  recall  their  behavior. 
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1. 


Introduction 


Much  of  social  science  is  conducted  by  asking  Informants  to  describe 
their  behavior.  This  is  true  of  studies  of  such  disparate  things  as 
organizational  communications,  food  consumption,  child  rearing  practices, 
sex  role  behavior,  and  so  on.  Studies  of  naturally-occurring  behavior 
fall  into  two  groups,  for  our  purposes:  those  in  which  it  is  possible 
to  check  directly  the  accuracy  of  Informants'  reports,  and  those  in  which 
it  is  not  possible  to  do  so.  Social  network  data  are  typically  of  the 
latter  kind;  it  is  simply  too  unwieldy  to  check  the  accuracy  of  infor¬ 
mants'  responses  to  questions  such  as  "who  do  you  talk  to?"  Besides,  if 
one  could  easily  check  the  responses,  then  why  ask  informants  questions 
in  the  first  place? 

Now  it  is  obviously  very  important  in  any  field  to  collect  accurate 
data.  Otherwise,  theoretical  deductions  made  from  data  (e.g.  about  social 
strructure)  will  be  at  best,  suspect.  The  validity  of  data  about  human  j 
behavior  has  long  been  a  source  of  vexation;  La  Pierre  (1934)  appears  to 
have  been  among  the  first  researchers  to  approach  the  problem  experimentally. 
In  a  classic  study,  he  toured  the  United  States  with  a  Chinese  couple, 
staying  at  hotels  and  eating  in  restaurants  along  the  way.  They  were 
sefved  in  251  establishments,  and  were  refused  service  in  only, one. 

Six  months  after  the  trip  was  over.  La  Pierre  obtained  questionnaire 
responses  from  128  of  the  establishments.  Ninety  two  percent  claimed  i 
that  they  would  not  "accept  members  of  the  Chinese  race  as  guests.” 

Since  then  a  great  deal  of  research  has  shown  that  attitudes  just  do  not 
predict  behavior  in  most  cases.  Deutscher  (1972)  has  reviewed  much  of 
the  literature  up  to  1970;  and  McGuire  (1975)  has  wondered  in  print  why 
researchers  remain  preoccupied  with  attitudes  at  all. 


If  the  problem  were  simply  of  correspondence  between  attitudes  and 
behavior,  then  it  could  be  circumvented  by  asking  people  what  they  do 
rather  than  how  they  feel  about  certain  things.  Imagine,  for  example, 
what  might  have  happened  had  La  Pierre  asked  his  respondents  If  they 
ever  had  given  service  to  a  Chinese  person.  One  might  assume  that  asking 
people  what  they  do  is-  a  better  proxy  for  what  they  do  than  asking  them 
how  they  feel. 

Unfortunately,  as  far  as  we  are  aware,  this  turns  out  not  to  be  the 
case.  For  example,^  since  at  least\l951  (Meredith,  et  alj)  researchers  of 
human  nutrition  have  known  that  people  do  not  recall  with  any  accuracy 
what  they  eat,  even  in  "the  past  24  hours.”  Researchers  have  been 
attempting  to  deal  with  the  problem  continuously  since  then,  and  especially 
In  the  last  ten  years  (see,  for  example,  Beaton  et  alJ,  1979,  and  Greger 
and  Etnyre,  1978). 

In  human  communications  research.  It  appears  that  researchers  have 
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been  rather  more  trusting  of  their  data.  Ue  are  unaware  of  any  research 
before  1969  which  addresses  the  problem  in  any  way  —  except  for  Isolated 
calls  that  data  accuracy  should  be  checked  (Taglurl,  Blake  and  Bruner,  1953) 
In  1969,  however.  Manner,  Polgar  and  Salzlnger,  as  part  of  a  study  of 
speech  predictability,  were  forced  to  conclude  that  Informants’  cognition 
"does  not  constitute  an  adequate  substitute  for  observation  [of  behavior)." 
This  pessimistic  conclusion  appears  to  have  been  universally  ignored  by 
students  of  social  networks  (we  ourselves  were  unaware  of  it  until  recently) 

In  1975,  we  began  a  series  of  papers  (Killworth  &  Bernard,  1976; 

Bernard  6  Killworth,  1977;  Killworth  6  Bernard,  1979a;  Bernard,  Killworth 
and  Sailer,  1980  —  hereafter  referred  to  as  A  [Accuracy]  I- IV)  to  examine 
the  accuracy  of  Informant a'  cognition  about  one  form  of  their  behavior,  » 
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specifically  the  response  to  the  question  “who  do  you  talk  to?"  This 
involved  studying  many  naturally-occurring  groups  whose  behavior  was 
either  automatically,  or  at  least  fairly  unobtrusively,  monitored.  We 
compared  the  answers  to  the  question  "who  do  you  talk  to?"  (recall  data) 
with  the  actual  communications  of  the  informants  (behavioral  data). 

Our  main  conclusion  was  that  informants  can  not  recall  with  accept¬ 
able  accuracy  whom  they  communicate  with  in  a  group  over  a  period  of  time. 
For  example.  Informants  claim  they  talk  to  people  they  never  actually  talk 
to;  they  claim  they  never  talk  to  people  they  do  talk  to;  and  they  are 
unable  to  rank  or  scale  their  consunications  accurately  even  when  referring 
to  the  people  with  whom  they  have  conmunicated  the  most. 

We  considered  the  possibility  that  individual  differences  among 
informants  (on  socioeconomic  indicators,  or  on  how  accurate  they  felt 
they  were,  for  example)  might  help  to  account  for  variation  in  their 
accuracy  (All).  We  have  found  nothing  that  accounts  for  substantial 
parts  of  variation  in  informant  accuracy.  We  also  considered  the  possi¬ 
bility  that  different  structures  of  groups  of  communicants  might  be  related 
to  accuracy  of  communication  recall.  We  tested  many  different  triadic 
structures,  and  again  found  nothing  to  account  for  variation  in  informant 
accuracy,  though  wa  did  find  that  both  recall  data  and  actual  communication 
data  possess  significantly  high  or  low  amounts  of  structure  on  every 
structural  indicator  we  could  think  of.  Unfortunately,  the  structures  in 
any  particular  aet  of  recall  data  were  never  produced  by  the  same  triads 
aa  those  in  the  matched  set  of  behavior  data  (AIII). 

Finally,  we  considered  the  possibility  that  informant  accuracy  is  a 
function  of  sub-group  organisation  (AIV).  Perhaps  modern  clique-finding 
algorithms  might  uncover  an  essential,  underlying  agreement  between  recall 
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and  behavior  data?  Again,  this  turned  out  not  be  the  case.  The  three 
clique-finders  we  used  (chosen  because  they  represent  three  major  traditions 
in  the  literature)  failed  to  produce  similar  cliques  in  our  matched  sets 
of  recall  and  behavior  data  (or  with  each  other). 

Of  course,  it  is  possible  that  informant  characteristics  really  are 
responsible  for  variations  in  accuracy  of  communications  recall  data 
(or  any  behavioral  recall  data).  It  may  be  that  we  have  simply  not  made 
the  correct  comparisons.  Similarly,  there  may  be  triadic  structures  which 
would  give  better  answers  than  those  we  have  tested;  and  there  are  certainly 
many  clique-finders  which  we  have  not  examined.^  But  the  search  would  be 
endless.  Clearly,  another  approach  is  needed. 

In  this  paper  we  examine  the  possiblity  that  the  Inaccuracy  we  have 

found  is  a  function  of  time  period  over  which  Informants  are  asked  to 

recall  their  behavior.  All  our  previous  data  sets  have  been  based  on 

informant  reports  of  their  behavior  during  one  of  three  "windows": 

tne  previous  five  days;  the  previous  month;  and  the  forthcoming  month. 

Any  period  of  time,  or  window,  can  be  characterised  by  two  quantities, 

which  we  call  "lag"  and  "width."  Width  is  the  amount  of  time  over  which 
* 

Informants  are  asked  to  recall  their  behavior.  Lag  is  the  amount  of  time 
that  has  elapsed  since  the  end  of  the  window.  Thus,  the  five-day  windows 
in  some  of  our  previous  experiments  have  a  width  of  five  days,  and  a  lag 
of,  at  most,  one  day. 

The  majority  of  questions  asked  by  students  of  social  networks  have 
a  lag  of  less  than  one  day,  with  widths  that  range  from  a  few  days  to 
the  life  time  of  the  informant.  It  seems  plausible  that  very  recent  time 
windows  should  tend  to  be  more  accurate  than  windows  far  in  the  past. 

"Who  did  you  talk  to  one  minute  ago?"  should  yield  more  accurate  data 


than  "who  did  you  talk  to  for  a  minute  at  this  tine  last  month?"  Similar 
variations  in  accuracy  could  be  caused  by  dlffcri'iil  widths:  "who  did 
you  talk  to  during  a  period  of  a  week,  a  month  ugo?"^  Is  there  a  combination 
of  lag  and  width  which  yields  the  most  accurate  social  network  data? 

In  order  to  test  this,  we  conducted  a  totally  automated  experiment 
using  a  computer-based  communication  system  known  as  EIES.  Both  behavioral 
and  recall  data  were  gathered  by  the  computer.  In  section  2,  we  describe 
the  communications  medium,  and  In  section  3,  we  provide  details  about  the 
experiment  Itself  and  the  data  acquisition. 

2.  EIES:  A  Computer-Based  Communications  Medium 

Prior  to  this  experiment,  all  our  work  had  been  on  single  time 
windows.  In  order  to  study  the  accuracy  of  recall  over  multiple  time 
windows,  either  of  two  things  is  required:  a)  many  experiments,  on  many 
different  groups,  over  many  windows ;  or  b)  a  single  experiment  on  a  group 
engaged  in  continual  conversation  over  a  long  period  of  time. 

An  ideal  example  of  the  latter  case  is  the  Electronic  Information 
Exchange  System  (EIES)  at  the  New  Jersey  Institute  of  Technology.  The 
system  was  developed  and  funded  by  the  National  Science  Foundation  as  a 
means  of  improving  communication  among  scientists.  The  idea  was  to 
enable  scientists  to  communicate  via  computer  rather  than  on  a  face-to-face 
basis,  and  to  Improve  their  scholarly  productivity. 

A  complete  description  of  EIES,  Including  its  technology  and  design 
philosophy  may  be  found  in  Blitz  and  Turoff  (1978).  Briefly,  EIES  allows 
an  Individual  to  exchange  messages  with  others  on  the  system  by  leaving 
the  message  in  a  central  computer  for  pick-up  during  the  next  time  the 
"receiver"  logs  on.  Messages  may  be  addressed  to  single  individuals,  with 
or  without  copies  to  other  individuals.  Messages  may  also  be  sent  to 


"groups."  A  typical  group  on  EIES  consists  of  between  10  and  100  people 
who  have  common  interests  and  who  are  working  on  a  common  problem.  Many 
groups  on  EIES  are  composed  of  scientists'/  who  hold  ongoing  "conferences" 
for  periods  up  to  two  years  since  the  introduction  of  EIES.  * 

Members  of  a  group  are  free  to  enter  into  small  or  large  conferences  with 
subsets  of  their  own  groups,  or  of  other  groups. 

"Conference  comments"  are  a  kind  of  public  message  submitted  by  a 
conferee  for  all  members  of  a  conference  to  read.  Conference  toplca 
range  from  broad,  theoretical  discussions  of,  for  example,  general  systems 
theory  to  very  specific  work-group  discussions  of,  for  example,  data 
manipulation  techniques.  One  EIES  group  planned  and  executed  the  experiment 
reported  In  this  paper. 

"Private  messages"  are  communications  between  Individuals;  only  the 
sender  or  the  addressees  of  a  private  message  are  privileged  to  access 
that  message.  Private  messages  include  side  remarks  about  conferences; 
personal  letters  between  friends,  enemies  and  colleagues;  and  chit-chat 
between  casual  EIES  acquaintances.  Every  EIES  participant  can  be  Identified 
and  addressed  by  name,  nickname,  or  number  (e.g.,  H.  RUSSELL  BERNARD, 

RUSS,  or  357). 

In  other  words,  conferences  function  like  the  formal  organisations 
of  a  business  or  university  department.  The  private  messages  replace 
what  might  be  called  the  "day-to-day  coonunicatlon  network,”  where  people 
talk  about  work  and  more  casual  social  relations.  Many  studies  of  social 
networks  in  such  environments  have  been  conducted;  the  advantage  of  EIES 
for  our  purposes  is  that  every  non-foimal  comaunlcntlon  (i.e.,  private 
message)  can  be  permanently  recorded.  The  privacy  of  the  content  of  those 
messages  Is  zealously  guarded.  He  do  not  treat  the  content  of  messages 
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in  the  experiment,  only  what  is  known  as  "who-to-whom  traffic,"  or  who 
communicated  with  whom,  and  for  how  many  lines  of  type. 

At  first  glance,  E1ES  may  appear  to  be  a  rather  "exotic"  connunicat ions 
medium  for  a  naturalistic  study.  After  all,  the  overwhelming  majority 
of  scientists,  much  less  the  rest  of  the  world,  do  not  (yet)  communicate 
via  computer.  Some  of  the  data  used  in  AI-IV  (teletype  messages  between 
deaf  people,  voice  activated  tape  recordings  of  ham  radio  operators)  might 
also  appear  esoteric.  There  are  at  least  two  reasons  why  EIES  is  a 
legitimate  medium  for  the  experimental  study  of  communications  recall, 
and  is  not  exotic. 

1)  The  group  is  simply'  not  exotic  for  what  we  are  studying.  It 
occurs  naturally  and  involves  a  subset  of  the  population  we  wish  to  study. 
Some  subsets  are  indeed  larger  than  others;  there  are  more  than  three 
hundred  thousand  ham  radio  operators  in  the  United  States  along,  and  there, 
are  more  deaf  teletype  users  than  there  are  computer  conferenccrs.  But 
they  are  all  human  beings,  of  the  same  general  cultural  background,  whose 
accuracy  of  recall  we  are  interested  in  testing.  Clearly,  we  can  not 
generalize  about  the  structures  found  in  such  groups  to  the  world  at  large. 
But  we  can  (and  do)  generalize  about  our  informants^  ability  to  recall  their 
communications. 

2)  It  is  true  that  teletypes,  radios,  and  computers  are  relatively 
rare  media  of  communication.  However,  it  turns  out  that  the  accuracy  of 
Informants  who  use  these  media  is  just  as  poor  as  that  of  informants  who 
don't.  Wu  have  studied  several  face-to-faco  groups:  two  offices  (All) 
and  a  fraternity  (Killworth  and  Bernard  1979b).  All  of  the  previous  work, 
then,  indicates  that  one  should  not  expect  EIES  to  be  a  "special  case." 
Indeed,  it  turns  out  not  to  be. 
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Clven  that  we  are  interested  in  comparing  human  recall  of  communication 
with  actual  behavior,  EIES  is  an  ideal  experimental  medium. 


3.  The  Experiment 

Between  December,  1978  and  April,  1979,  57  paid  volunteer  EIES 
users  participated  in  our  experiment.  They  ranged  in  age  from  18  to  64, 
and  included  students  and  scientists  from  many  different  fields.  An 

invitation  to  participate  in  the  experiment  was  sent  to  over  150  EIES 
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members  via  a  personal  message  from  Bernard.  Depending  on  the  rate  of 
their  EIES  use,  each  informant  took  up  to  37  interviews,  each  for  a 
specific  lag  and  width.  When  an  Informant  logged  in  to  EIES,  the  computer 
selected  a  window  and  administered  an  interview.  The  informant  was 
asked  to  list  the  people  with  whom  he  or  she  communicated  during  that 
window.  Next,  Informants  were  given  an  opportunity  to  add  or  to  delete 
names  from  the  list,  and  were  asked  to  estimate  the  number  of  messages 
and  the  number  of  lines  sent  to  and  received  from  each  communicant 
recalled.  Finally,  they  were  asked  to  rate  their  confidence, lin: several 
different  ways,  on  a  scale  from  1-7,  about  the  information  provided. 

At  the  end  of  each  interview,  informants  were  given  the  opportunity 
to  send  the  experimenters  a  message  containing  any  observations  or 
suggestions  they  wished  to  make.  Twenty-seven  windows  were  established 
according  to  the  pattern  Bhown  in  Table  1.  Windows  were  selected  for 
informants  in  random  order.  The  window  selection  was  modified  by  computer 
throughout  the  experiment  to  ensure  even  cove  rage  of  all  the  vindows  in  the 
experiment.  The  remaining  10  windows  we  coll  "lust-ons;"  for  these 
windows  people  were  asked  to  recall  their  communications  during  the  last 
time  they  were  on  EIES.  This  ranged  from  severs!  weeks  to  several  minutes 


K 


In  lag.  Twenty-three  informants  completed  all  37  windows  and  both  Interviews, 
and,  out  of  57  Informants,  no  regular  window  was  taken  fewer  than  32  times 
or  more  than  38  times.  Twenty-two  informants  took  all  10  last-on  windows, 
and  37  people  took  at  least  one. 

On  E1ES  there  is  a  phenomenon  called  "deleted"  messages  —  messages 
sent,  and  possibly  received,  but  then  purged  from  EIES  before  our  data 
collection  routines  oould  collect  them.  Eight  percent  of  the  1211  inter¬ 
views  are  contaminated  by  deleted  messages,  but  never  by  more  than  one 
message  per  interview. 

Two  questionnaires  were  also  administered  by  the  computer.  The 
first  interview  collected  data  on  all  our  informants'  age,  sex,  self- 
reported  EIES  use,  and  seven  self-reported  estimates  of  memory  (e.g.  "how 
well,  on  a  scale  from  1-7,  do  you  remember  birthdays?",  "how  well  names?", 
etc.)  The  second  interview  was  taken  by  the  22  Informants  who  completed 
all  27  of  the  basic  window  interviews.  It  again  asked  for  information  on 
EIES  use,  and  also  asked  Informants  about  the  20  people  with  whom  ti>ey 
had  actually  communicated  most.  For  each  of  those  20,  Informants  were 
asked  to  rate  (on  a  scale  of  1-7)  the  Importance  of  the  communication, 
how  satisfying  it  was,  how  desirable  communication  was  with  that  person, 
and  how  interesting  it  was. 

Data  collection  in  this  experiment  was,  in  a  sense,  scheduled  at 
the  leisure  of  the  Informant,  and  performed  by  the  central  computer 
itself.  Thus,  it  was  possible  to  allow  our  respondents  some  control  over 
the  progress  of  interviews.  An  Informant  could  withdraw  from  the  exper¬ 
iment  (permanently  or  temporarily)  at  any  time.  Informants  could  check  on 
their  own  accuracy  for  the  previously  completed  interviews  by  using  a 
routine  called  "feedback."  They  could  also  check  on  their  general  progress 
by  using  a  routine  called  "windows." 
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Two  other  routines  were  Introduced  which  we  felt  night  lllunlnate 
the  causes  of  variation  in  informant  accuracy.  These  were  called  “rain- 
check''  and  the  "harassment  Unit."  The  interviews  were  administered 
randomly  at  the  very  beginning  of  an  EIES  session  at  a  rate  sufficient 
to  keep  all  the  subjects  at  the  sane  pace.  For  any  given  interview,  a 
respondent  was  allowed  to  take  a  raincheck  of  fron  1-7  days.  (This  was 
changed  to  1-3  days  later  in  the  experiment,  since  we  felt  things  were 
going  too  slowly.)  After  taking  a  raincheck,  there  was  no  way  a  respondent 
could  avoid  an  interview  the  next  tine  he  or  she  logged  onto  EIES. 

The  harassment  limit  was  the  maximum  amount  of  bother  that  an 
informant  was  willing  to  put  up  with  in  one  session.  After  each  inter¬ 
view,  which  averaged  about  6-8  minutes,  if  sufficient  time  was  left 
in  the  harassment  limit,  a  last-on  window  was  administered.  Most  infor¬ 
mants  selected  20  minutes  as  their  harassment  limit. 

All  che  software  for  the  experiment  was  written  by  Peter  and  Trudy 
Johnson-Lenz.  This  Included  all  the  routines  which  kept  track  of  the 
behavioral  data,  as  well  as  those  which  administered  the  interviews 
and  which  allowed  participants  to  enter  or  withdraw  from  the  experiment, 
to  check  on  their  progress,  and  so  forth.  David  Harvey  and  the  EIES 
technical  staff  at  the  Computerized  Conferencing  and  Communications  Center 
at  the  New  Jersey  Institute  of  Technology  wrote  the  data  from  disk  to 
tape.  The  success  of  this  experiment  is  due  entirely  to  the  hard  work 


of  these  individuals 


IV 


Measuring  Accuracy 


There  are  various  ways  one  might  want  to  measure  accuracy;  each  way 
is  a  function  of  what  a  researcher  might  want  to  do  with  the  recall  data 
at  his  or  her  disposal.  For  example,  if  the  data  were  gathered  in  the 
form  "who  are  the  three  people  you  communicate  with  the  most?"  then  the 
researcher  would  only  require  that  the  three  persons  named  by  an  Informant 
were  indeed  the  three  moat  frequently  communicated  with  persons  in  the 
informant's  network.  Furthermore,  the  ordering  of  the  three  would  clearly 
be  irrelevant.  Another  researcher  might  want  to  know  the  entire  network 
of  each  person;  he  or  she  would  then  require  chat  all  and  only  those 
people  spoken  to  by  each  informant  be  named.  Yet  another  researcher 
■eight  be  analyzing  Che  frequency  (number  of  contacts  or  messages)  or 
amount  (number  of  lines,  or  words,  or  minutes)  of  communication.  He  or 
she  would  have  far  more  stringent  requirements  on  accuracy  than  the  first 
researcher,  who  needed  only  three  names.  Clearly,  different  research 
goals  invoke  different  definitions  of  "accuracy.” 

For  our  purposes,  we  concocted  48  different  measures  of  accuracy, 
most  of  which  were  used  previously  in  this  series  of  papers.  They  fall 
into  broad  classes  which  make  them  easy  to  describe. 

Each  measure  is  computed  separately  for  messages  the  informant 
recalls  sending  to  people,  those  from  people,  and  those  both  to  and  from, 
combined,  shown  in  Table  2  as  T,F,B. 

The  first  six  classes  use  only  the  names  of  those  recalled  and  those 
actually  communicated  with.  (Measures  that  use  "number  of  messages,"  and 
"number  of  lines"  as  indicators  of  intensity  of  messaging  follow.)  Tl, 
TIP,  and  T2P  are  straightforward.  T12A  counts  the  number  of  mistakes 
(Tl  +  T2)  as  meaningful  in  relation  to  the  total  number  of  people  actually 


comunlcated  with.  T12AR  counts  the  number  of  mistakes  as  a  percentage 
of  the  total  number  of  possible  mistakes  (NA  +  NR) ,  given  the  number  of 
people  recalled  and  the  number  of  people  actually  connunlcated  with  for 
that  Informant  and  window. 

The  second  and  third  classes  of  Inaccuracy  measures  use  either 
"number  of  messages”  or  "number  of  lines"  as  Indicators  of  intensity  of 
communication,  noted  In  the  table  as  M  or  l.  This  allows  us  to  rank 
the  recalled  and  actual  communicants,  and  to  see,  for  Instance,  whether 
people  can  recall  with  accuracy  those  people  with  whom  they  conanunlcate 
most. 

TOPS,  T0P3,  and  TOPI  measure  the  percentage  of  errors  people  make 
about  those  they  report  aa  their  most  frequent  communicants.  WIN2  suggests 
that  people  might  be  able  to  recall  those  people  most  frequently 
communicated  with,  but  that  the  exact  ranks  might  be  off  by  2  or  so,  and 
-still  be  counted  as  correct.  WIN20  should  Indicate  when  a  person  recalls 
actually  communicated  with  In  the  correct  order,  but  does  not  penalize 
the  informant  for  leaving  people  out  randomly. 

So,  for  example,  TIFF  la  the  percentage  of  messages  from  others 
recalled  by  the  Informant  which  In  fact  did  not  exist.  And  T0P5TL  Is 
the  percentage  of  people  reported  to  be  In  the  top  5  most  frequently 
comsainlcated  with  (measured  by  estimated  number  of  lines)  not  actually 
In  cop  5  (measured  by  actual  number  of  lines).  Virtually  all  of  the 
percentages  In  this  study  are  what  Tukey  (1977)  calls  "started."  For 
exasq>le.  Instead  of  TIP  "  Tl/NR,  wa  actually  usa  TIP  »  (T1  +  l/6)/(NR  4-  1/3) 
except,  of  course,  whan  NR  la  zero,  whan  TIP  la  undefined.*  The  specific 
purpose  is  to  make  a  small  adjustment  to  all  of  the  ratios  which  will 


permit  later  transformation  by  logs.  Inverses,  ratios,  etc.,  where  values 
of  zero  cause  problems. 

All  of  these  measures  take  a  value  close  to  zero  when  the  recall  Is 
accurate,  and  increase  with  Inaccuracy.  Host  measures  tend  to  a  maximum 
of  1  when  the  recall  is  totally  inaccurate,  the  exception  being  11  and  T2 
(which  are  straight  counts)  and  T12A  (which  can,  and  frequently  does, 
exceed  unity.) 

In  the  descriptions  which  follow  (and  indeed  throughout  this  paper) 
we  shall  refer  to  the  "windows"  section  of  the  data  only  (that  is,  leaving 
out  "last-ons"),  unless  otherwise  specified. 

A  simple  comparison  of  the  number  of  people  recalled  and  the  actual 
number  of  people  communicated  with  demonstrates  the  unacceptable  level 
qf  error  in  the  data.  On  average,  2.5  (SD  4.2)  people  were  recalled  as 
being"  communicated  with;  this  number  ranged  from  0-48  in  the  data.  However, 
6.0  (SO  10.9)  people  were  actually  communicated  with.  This  number  ranged 
from  0-111.  Thus,  the  gross  underestimation  of  communication  found  in 
AI,11  continues  to  be  present  in  these  data. 

The  average  values,  standard  deviations,  minima  and  maxima  of  the 
48  accuracy  measures  are  given  in  Table  3.  There  are  several  things 
which  are  immediately  apparent.  For  example,  the  levels  of  inaccuracy 
are  indistinguishable  among  the  "to,"  "from,"  and  "both"  values  within 
any  given  measure,  and  the  same  is  true  for  "messages"  and  "lines." 
Although  the  number  of  cases  involved  runs  from  almost  250  to  950 
(one  cannot  define  TOPI,  for  example,  if  no  contacts  were  recalled),  the 
only  significant  differences  between  T,F  and  B,  or  H  and  L,  is  in  the 
simple  count  measures  T1  and  T2,  which  one  would  expect.  This  is  a 
little  surprising.  We  might  have  expected  informants  to  belter  recall 
"to"  measagea,  which  they  initiate,  then  "from"  messages,  which  are 
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initiated  by  others.  This  is  sinply  not  the  case,  as  Table  3  demonstrates. 
So  unless  otherwise  specified,  we  will  refer  to  measures  without  detailing 
to,  from,  both,  messages  or  lines. 

Only  a  small  number  of  people  were  recalled  who  were  not  actually 
communicated  with  (Tl)  in  a  given  window:  0.63  (SD  2.1).  On  one 
memorable  occasion,  however,  48  people  were  recalled  —  the  maximum' 
number  ever  recalled,  in  fact  —  but  none  were  spoken  to.  Although  0.63 
is  an  apparently  small  error,  as  a  percentage  of  the  number  of  people 
recalled  (TIP),  the  error  is  30Z  (SD  32Z).  Thus  of  those  recalled,  about 
one-third  were  not  communicated  with. 

The  figures  are  worse  if  one  examines  how  many  people  were  not 
recalled  but  should  have  been  (T2) .  On  average,  5.1  (SD  9.3)  people 
were  forgotten,  with  an  awesome  maximua  of  93.  This  is  also  a  high 
percentage  of  the  number  of  people  actually  conunlcated  with  (T2P) , 
namely  66Z  (SD  78Z) .  In  other  words,  two-thirds  of  the  people  an 
Informant  received  messages  from  were  forgotten. 

Counting  each  occurrence  of  these  two  mistakes  as  an  error,  we  can 
count  how  many  errors  each  Informant  makes.  If  the  informant  says  he  or 
she  talked  to  A,B,  and  C  but  really  talked  to  A,B,  and  D,  the  Informant 
made  two  errors:  of  commission  for  C  and  omission  for  D.  Judged  as  a 
percentage  of  the  number  of  people  the  Informant  really  conmunlcated 
with  (here  3),  this  would  give  an  error  of  two-thirds,  or  67Z.  The  real 
figure  is  rather  higher,  unfortunately:  79Z  (SD  46X) .  So,  roughly, 
four-fifths  of  what  an  informant  says  is  wrong  in  some  way. 

Now  many  socionetrlc  studies  concentrate  on  only  the  main 
communicants  for  each  informant  (neglecting  infrequently  coimiunicated-wlth 
people,  which  are,  it  is  hoped,  the  main  sources  of  the  above  error). 

As  we  found  in  AI-II,  however,  it  turns  out  that  informants  know  their 
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most-frequent  communicants  no  better  than  they  know  their  other  communicants . 
Whether  one  examines  number  of  messages  or  number  of  lines,  or  to,  from, 
or  both,  one  finds: 

(a)  more  than  52Z  of  the  time.  Informants  choose  the  wrong  most- 
frequent  coimsunlcants  (TOPI); 

(b)  more  than  4  OX  of  the  top  three  ranked  communicants  should  not 
belong  in  the  top  three,  (T0P3); 

(c)  more  than  33Z  of  the  top  5  ranked  communicants  should  not  belong 
In  the  top  5  (TOPS); 

(d)  If  one  ranks  the  people  recalled  In  order  of  the  recalled 
communication,  sure  than  45X  have  ranks  differing  by  more  than 
2  from  their  position  In  the  actual  comunication  list  (WIN2) ; 

(e)  In  (d)  above,  more  than  58X  of  those  recalled  have  relative 
positions  In  the  ranked  list  more  than  10Z  removed  (either  way) 
from  their  relative  posltiona  in  the  actual  coins  unicat  ions  list 
(WIN20) . 

In  other  words,  we  can  not  rely  on  the  people  an  Informant  recalls, 
or  the  number  of  messages,  or  the  number  of  lines,  or  the  people  an 
informant  claims  to  speak  to  most,  with  any  reliability.  As  a  rough  guide, 
we  have  the  consistent  result  (see  also  AI.II)  that  at  least  half  of 
what  an  informant  says  about  his  or  her  communication  with  others  Is 
Incorrect. 

It  is  clearly  cumbersome  to  refer  continually  to  48  separate 
measures  of  accuracy,  especially  when,  as  we  have  seen,  they  are  very 
similar.  To  reduce  the  level  of  complexity,  the  results  of  a  factor  analysis 
on  those  eccuracy  measures  which  lay  between  0  and  1  was  used  to  combine  them 
Into  general  Indices .  (We  shall  raturn  to  such  measures  as  T1  later) . 
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Five  factors  were  created,  each  with  a  recognizable  set  of  Measures 
comprising  the  main  factor  loading.  Since  each  set  is,  furthermore, 
a  plausible  subset  of  "similar"  measures,  we  created  five  new  overall 
Inaccuracy  measures  as  follows: 

ACCT  -  average  of  (T1PT,  T0P3TM,  T0P5TM,  WIN2TH,  T0P3TL,  T0P5TL,  WIN2TL) 
ACCF  -  average  of  (T1PF,  T0P3FM,  T0P5FM,  WIN2FM,  T0P3FL,  TOPSFL,  HIN2FL) 
ACC2  -  average  of  (T2PT,  T2PF,  T2PB,  T12ART,  T12ARF,  T12ARB) 

ACCT0P1  -  average  of  (T0P1TM,  T0P1TL,  T0P1FM,  T0P1FL,  T0P1BM,  T0P1BL) 
ACC20  -  average  of  (WIN20TM,  UIN20TL,  WIN20FM,  WIN20FL,  WIN20BM,  UIK20BL) 
where  "average"  above  is  defined  as  follows: 

if  two  or  more  of  the  measures  in  a  definition  have  non-undef ined 
values,  the  "average"  is  a  simple  average  of  the  non-undef ined 
values;  if  only  one  or  zero  of  the  measures  in  a  definition  is 
defined,  the  "average"  is  undefined  (l.e.,  missing). 

The  pattern  of  theae  five  measures  should  be  evident.'  -  ACCT  is  a 
compilation  of  "to"  measures  in  errors  of  comisslon,  roughly  speaking; 

ACCF  is  the  Identical  compilation  of  "from"  measures.  ACC2  involves  a 
composite  of  T2P  and  T12AK,  and  roughly  measures  errors  of  omission. 

ACCT0F1  is  a  simple  average  of  all  TOPI  measures,  and  ACC20  a  simple 
average  of  all  W1H20  measures. 

The  values  of  the  five  new  inaccuracy  measures  reflect  Che  values 
of  Che  48  original  variables  well.  ACCT  has  a  mean  of  0.46  (5D  0.31); 

ACCF  0.44(SD  0.29);  ACC2  0.65  (SO  0.27);  ACCT0P2  0.55  (SO  0.29);. 
and  ACC20  0.59  (SO  0.24).  These  means  are  based  on  a  minimum  of  460 


valid  cases. 
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WINDOW 

WIDTH 

LAC 

TIME  AGO 

INTERVIEWS 

COMPLETED 

1 

30 

31 

60 

36 

2 

30 

1 

30 

36 

3 

14 

47 

60 

35 

4 

14 

17 

30 

36 

5 

14 

1 

14 

35 

6 

7 

54 

60 

32 

7 

7 

24 

30 

34 

8 

7 

8 

14 

35 

9 

7 

1 

7 

34 

10 

3 

58 

60 

34 

11 

3 

28 

30 

36 

12 

3 

12 

14 

37 

13 

3 

5 

7 

35 

14 

3 

1 

3 

34 

15 

2 

59 

60 

36 

16 

2 

29 

30 

36 

17 

2 

13 

14 

35 

18 

2 

6 

7 

34 

19 

2 

2 

3 

35 

20 

2 

1 

2 

34 

_  21 

1 

60 

60 

37 

22 

1 

30 

30 

38 

23 

1 

16 

14 

33 

24 

1 

7 

7 

37 

25 

1 

3 

3 

33 

26 

1 

2 

2 

34 

27 

1 

1 

1 

37 

28 

LAST  ON 

LAST  ON 

LAST  ON 

37 

29 

LAST  ON 

LAST  ON 

LAST  ON 

34 

30 

LAST  ON 

LAST  ON 

LAST  ON 

29 

31 

LAST  ON 

LAST  ON 

LAST  ON 

25 

32 

LAST  ON 

LAST  ON 

LAST  ON 

24 

.  33 

LAST  ON 

LAST  ON 

LAST  ON 

24 

34 

LAST  ON 

LAST  ON 

LAST  ON 

23 

35 

LAST  ON 

LAST  ON 

LAST  ON 

23 

36 

LAST  ON 

LAST  ON 

LAST  ON 

22 

37 

LAST  ON 

LAST  ON 

LAST  ON 

22 

TABLE  1 

WINDOW  LISTINGS 

Width  and  lag  are  defined  In  the  text;  tine  ago  is  the  tine  between  the 
Interview  date  and  the  start  of  the  window.  All  times  given  In  days. 


# 


# 
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The  number  of  people  recalled  who  were  not  actually  communicated  with. 
Tl/NR,  where  NR  la  the  number  of  people  recalled. 

The  number  of  people  not  recalled  who  were  actually  communicated  with. 
T2/NA,  where  NA  is  the  number  of  people  actually  eomunlcated  with. 

(T1  +  T2)/NA 

(Tl  +  T2)/(NR  +  NA).  This  represents  the  percentage  of  the  total 
possible  number  of  mistakes  made  by  tha  informant. 


TOPn  —  Let  n  be  an  integer  (in  fact  n  -  1,3  or  5),  and  define  a  "hit”  to  occur 
whenever  a  person  is  in  both  the  top  n  most  Intense  recalled  ar.d  the 
top  n  most  intense  actually.  Then 

TOPn  -  1  -  °f  hits 

n 

hence  we  may  define 


VIN2  FH  —  Let  a  "hit"  mean  that  the  rank  of  a  person  on  the  recalled  list 
B)[lJ  la  within  2  of  his  or  her  rank  on  the  actual  list.  Then 


WIN2  -  1 


number  of  hits 
number  of  recalled 


WIN20 


Let  a  "hit!*  mean  that  the  percentile  rank  of  a  person  on  the 
recalled  list  is  within  10  of  his  or  her  rank  on  the  actual 
llat,  so  that 


WIN20 


number  of  hits  . 
number  recalled 


i 


I 


TABLE  2 

INACCURACY  MEASURES 

T.F.B  refer  to  ’to*,  'from'  and  ‘both  to  and  from'  respectively.  M  and  L 
refer  to  number  of  messages  and  lines  respectively.  All  measures  ata  taro 
for  accurate  recall  and  increase  with  Inaccuracy. 


Mean 

S.D. 

Min. 

Mix. 

T 

0.70 

2.0 

0 

45 

T1  F 

0.61 

1.8 

0 

42 

B 

0.63 

2.1 

0 

48 

T 

0.37 

0.34 

0.01 

1.0 

TIP  F 

0.3S 

0.32 

0.01 

1.0 

B 

0.30 

0.32 

0.01 

1.0 

X 

3.3 

7.7 

0 

85 

T2  F 

3.5 

5.7 

0 

48 

B 

5.1 

9.3 

0 

93 

T 

0.59 

0.31 

0.01 

0.99 

T2F  F 

0.67 

0.27 

0.03 

0.99 

B 

0.66 

0.28 

0.01 

0.99 

T 

0.81 

0.60 

0.02 

6.9 

T12A  F 

0.82 

0.45 

0.03 

4.6 

B 

0.79 

0.46 

0.01 

4.6 

T 

0.44 

0.26 

0.01 

0.99 

T12AR  F 

0.49 

0.25 

0.02 

0.99 

B 

0.48 

0.25 

■  0.01 

0.99 

T 

0.52 

0.38 

0.12 

0.87 

TOPI  F  M 

0.54 

0.37 

0.12 

0.87 

B 

0.52 

0.38 

0.12 

0.87 

T 

0.54 

0.37 

0.12 

0.87 

TOPI  F  L 

0.54 

0.37 

0.12 

0.87 

B 

0.54 

0.37 

0.12 

0.87 

Mean 

S.D. 

Min. 

Max. 

T 

0.40 

0.30 

0.04 

0.97 

T0P3  F 

0.43 

0.32 

0.04 

0.96 

B 

0.40 

0.30 

0.04 

0.97 

T 

0.42 

0.30 

0.04 

0.96 

TOP  3  F 

0.44 

0.31 

0.05 

0.96 

B 

0.42 

0.30 

0.05 

0.96 

T 

0.35 

0.28 

0.02 

0.97 

TOP5  F  M 

0.37 

0.29 

0.03 

0.97 

B 

0.33 

0.27 

0.03 

0.96 

T 

0.38 

0.29 

0.03 

0.97 

T0P5  F  L 

0.37 

0.29 

0.03 

0.97 

B 

0.36 

0.27 

0.03 

0.97 

T 

0.49 

0.33 

0.01 

1 

WIH2  F  M  0.48 

0.32 

0.01 

1 

B 

0.45 

0.32 

0.01 

1 

T 

0.52 

0.33 

0.02 

1 

W1N2  F  L  0.49 

0.32 

0.03 

1 

B 

0.47 

0.32 

0.02 

1 

T 

0.58 

0.32 

0.01 

1 

H1N20  F  M 

0.58 

0.31 

0.01 

1 

B 

0.58 

0.30 

0.01 

1 

T 

0.62 

,  0.31 

0.01 

1 

WIN 20  F  L  0.62 

0.29 

0.03 

1 

B 

0.60 

0.29 

0.02 

1 

TABLE  3 


VALUES  OF  INACCURACY  MEASURES 
(Meaeuree  are  defined  In  Table  2) 


V.  The  Effects  of  Lag  and  Width  on  Accuracy  of  Recall 

The  levels  of  inaccuracy  found  in  the  previous  section  are,  as 
hypothesized,  not  unifomly  distributed,  at  least  over  the  27  windows 
considered  here.  Figures  1-5  show  contours  of  the  five  overall  Inaccuracy 
aeasures,  as  functions  of  lag  and  width.  (All  values  for  a  given  lag 
and  width  have  been  averaged,  and  those  averages  contoured.  There  is  a 
wide  variation  between  informants.)  There  is  a  strong,  but  not  systematic, 
variation  with  lag  and  width  for  all  five  measures.  Kultiple  correlations 
of  the  measures  on  lag  and  width  account  for  at  bast  81  of  the  variance 
in  the  data  (for  ACC2);  inclusion  of  quadratic  terms  is  of  little  help, 
yielding  only  14%  at  best  (also  for  ACC2) . 

The  maximum  values  in  all  cases  are  for  two-  or  four-week  lags 
(usually  two)  and  widths  of  one  day.  As  hypothesized,  asking  people 
about  "one  day  a  long  time  ago"  does.  Indeed,  produce  highly  Inaccurate 
answers  (at  least  741  Incorrect  on  any  of  the  five  measures).  Curiously, 
a  lag  of  two  months  and  width  of  one  day  is  systematically  more  accurate 
than  two-week  or  one-month  lags  with  the  same  width,  although  the 
differences  are  not  statistically  significant. 1  This  suggests  that  for 
such  windows,  informants  tend  to  report  those  whom  they  believe  they 
"usually  talk  to."  In  fact,  this  explanation  vas  offered  by  several  users 
of  EIES  in  comments  which  they  made  to  us  on  the  system  about  the 
experiment.  Although  our  Informant a*  technique  for  handling  these  awkward 
windows  (one  day,  sixty  days  ago)  yields  more  accurate  data,  their  data 
for  such  windows  remain  at  least  70S  inaccurate. 

Increasing  the  width  of  the  window,  as  might  be  expected,  increases 
the  accuracy,  although  the  trends  in  any  measure  are  by  no  means  uniform. 
We  had  anticipated  that  a  lag  and  width  of  one  day  (l.e., yesterday)  would 


uniformly  produce  the  most  accurate  data.  On  only  two  of  the  measures 
(ACCT  and  ACC2)  was  this  the  case.  ACC20  was  the  most  extreme,  with 
greatest  accuracy  involving  a  week-long  window,  ending  the  day  prior  to 
the  interview. 

Let  us  consider  each  Inaccuracy  measure  in  turn.  ACCT  (Figure  1) 
measures  people's  inability  to  recall  who  they  sent  messages  to.  This 
inability  tends  to  Increase  as  either  lag  or  width  Increase.  ACCF 
measures  Inability  to  recall  who  people  received  messages  from.  The  effecta 
of  width  are  mainly  confined  to  1-3  days.  For  larger  widths  the  inaccuracy 
depends  only  weakly  on  width.  ACC2  measures  the  ability  to  Invent 
coimunicants  they  didn’t  really  comnunicate  with.  Here,  accuracy  is  best 
for  lags  of  1-2  days.  For  longer  lags.  Inaccuracy  increases  with  lag  and 
decreases  with  width'.  ACCT0P1  measures  people's  inability  to  recall 
their  most  "used"  communicant  (in  terms  of  either  frequency  or  amount  of 
communication).  For  widths  above  three  days,  the  measure  is  insensitive 
to  both  lag  and  width.  ACC20  measures  the  inaccuracy  of  what  an  informant 
recalls,  with  little  penalty  for  omitting  communicants.  Increasing  lag 
or  decreasing  width  both  increase  the  error  here,  although  for  small  lags 
(l.e.,  less  Chau  two  days)  the  effects  of  width  are  weak. 

.  All  the  cases  examined  so  far  allow  the  possibility  of  intervening 
communication  on  E1GS  between  the  end  of  a  window  and  the  time  of  an 
interview.  It  seems  likely  that  this  could  be  a  major  source  of  inaccuracy 
for  informants.  That  is,  the  intervening  communication  might  be  confused 
by  an  Informant  with  communication  during  a  particular  window. 

The  last-on  windows  were  included  in  order  to  test  for  this  hypo¬ 
thesis.  In  other  words,  we  believed  that  informants  might  be  more  accurate 
in  reporting  their  communications  with  others  the  last  time  they  used  EIES 
than  they  would  be  in  reporting  their  communications  during  any  of  the  27 


windows.  Indeed,  this  is  the  case.  The  five  inaccuracy  measures,  computed 
for  last-on  interviews  only  (with  a  minimum  of  97  cases),  have  the  following 
mean  values:  ACCT  0.16,  SD  0.35;  ACCF  0.31,  SO  0.32;  ACC2  0.48,  SD  0.34; 

ACCT0P1  0.37,  SD  0.30;  ACC20  0.43,  SD  0.32.  In  each  case,  these  values 
are  more  accurate  than  the  corresponding  value  for  the  27  windows. 

It  is  not  clear  how  to  decide  whether  these  values  are  significantly 
better,  due  to  the  many  contributory  factors  Involved  (not  the  least  of 
which  is  the  persistent  strong  differences  in  accuracy  between  Informants). 

A  naive  t-test  between  pairs  of  means  shows  significant**  differences 
in  every  case.  (Henceforth,  single  asterisks  denote  significance  at  the 
5Z  level  or  better;  double  asterisks  denote  significance  at  the  1Z  level, 
or  better) .  Now,  80Z  of  all  last-on  interviews  Involve  lags  of  at  most 
two  days,  whereas  the  average  windowed  lag  is  20  days.  Thus,  the  last-on 
inaccuracies  would  be  expected  to  be  less  than  regular  window  inaccuracies, 
due  to  this  fact  alone.  Restricting  attention  to  windows  and  last-ons 
possessing  identical  lags  and  widths,  the  results  continue  to  be  significantly** 
more  accurate  for  last-ons. 

Is  last-on  accuracy  affected  by  lag?  Multiple  regression  of  the 
Inaccuracy  measures  for  last-ons  with  log  (and  order  of  presentation,  to 
illuminate  a  possible  learning  effect),  accounts  for,  at  most,  15Z  of 
the  variance  (in  this  case  for  ACCF).  So,  Informants  are  not  systematically 
more  accurate  for  shorter  lags,  even  for  last-on  communication.  In 
fairness,  the  15Z  of  variance  accounted  for  in  ACCF  is  significant**,,  but 
the  scatter  implied  by  this  low  figure  is  sufficiently  great  to  invalidate 
the  use  of  very  short  lags  in  order  to  obtain  accurate  results. 

Although  last-on  inaccuracy  is  lees  than  window  Inaccuracy,  it  is 
clearly  still  too  large  for  reliable  use  of  recall  data  in  network 
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studies.  In  order  to  improve  accuracy  still  further,  we  examined  the  93 
last-on  windows  which  had  a  lag  of  zero  days.  In  other  words,  informants 
for  each  of  these  93  interviews  had  used  EIES  earlier  the  same  day  as  their 
Interview.  In  fact,  they  had  logged  off  EIES  no  more  than  20  minutes  ago. 
One  would  assume  that  Informants  would  be  highly  accurate,  given  that 
they  were  being  asked  to  recall  their  communications  such  a  short  time 
ago.  The  results  were  quite  mixed.  Some  people,  as  usual,  are  very 
accurate,  while  others  are  not.  For  example,  of  the  35  cases  in  wnlch 
ACCT0P1  could  be  computed,  20  were  correct.  However,  the  mean  Inaccuracies 
remain  unacceptably  high:  ACCT  has  a  mean  of  0.30,  SD  0.33;  ACCF  0.21, 

SD  0.26;  ACC2  0.42,  SO  0.35;  ACCT0P1  0.34,  SD  0.29;  ACC20  0.38,  SD  0.29. 
Surprisingly,  only  ACCF  is  significantly*  less  Inaccurate  for  same-day 
last-ons  than  for  last-ons  with  a  lag  of  one  or  more  days. 

...Given  the  very  short  lags  for  same-day  last-ons  (l.e.,  no  more  than 
20  minutes)  we  can  examine  how  inaccuracy  varies  in  very  short  time 
intervals.  The  scatter  still  remains  too  high  to  account  for  variations 
in  Inaccuracy.  Multiple  regression  of  the  five  measures  in  lag  and  width 

now  measured  in  minutes)  still  only  accounts  for,  at  best,  18Z  of  the 
variance  (in  this  case,  for  ACC2) .  In  no  case  is  a  significant  amount 
of  Variance  accounted  for.  As  an  Indication  of  the  scatter  involved, 
note  that  of  5  Interviews  conducted  just  one  minute  since  the  informant 
had  last  been  on  EIES,  on  two  of  these  occasions  ACCT  had  values  larger 
than  0.87,  and  on  three  occasions  values  less  than  0.2.  The  predominant 
factor  determining  accuracy  is  simply  wide  variation  amongst  Informants. 

Some  people  are  fairly  accurate,  while  others  are  grossly  inaccurate. 

He  will  examine  these  differences  further  in  Section  VI. 
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Recapitulating,  a  researcher  asking  for  conmunicatlons  data  could 
expect  the  most  accurate  results  from  data  on  very  recent  time  windows. 
However,  there  Is  no  way  to  know  a  priori  what  width  the  window  should  be 
for  greatest  accuracy.  It  is  highly  plausible  that  more  recent  events 
shot  Id  be  recalled  more  accurately  than  less  recent  events.  But,  while 
hardly  surprising,  these  results  are  not  trivial.  Consider  that  data 
on  a  lag  of  two  days  and  a  width  of  one  day  are  distinctly  less  accurate 
than  data  on  a  lag  and  width  of  one  day.  Hence,  the  exact  positioning 
of  the  window  In  time  has  an  extreme  effect  on  the  accuracy  of  the  data 
acquired:  even  tiny  alterations  in  the  lag  or  width  of  the  window 
ptoduce  large  alterations  in  the  accuracy. 

Nor  are  these  results  very  comforting.  The  most  accurate  value, 
for  any  non-last-on  window,  of  each  of  the  five  measures,  still  yields 
36Z  inaccuracy,  on  average.  Arguably,  this  could  be  counted  as  64Z 
accurate  data;  however,  (a)  there  is  no  way  to  know  which  data  are 
accurate  and  (b)  recall  that  all  cases  when  either  of  NR  or  NA  —  the 
number  of  people  recalled  and  actually  coaounicated  with  —  is  zero  have 
been  excluded  from  consideration;  these  are  also  highly  inaccurate. 

(Including  values  of  zero  for  NR  or  NA,  would  yield  infinite  values  of 
inaccuracy.  Removing  those  values,  however,  only  serves  to  raise 
artificially  the  level  of  accuracy.  Section  VIII  discusses  this  case  in  detail .) 

Still,  some  researchers  might  choose  to  interpret  this  finding  as 
an  encouraging  sign  that  asking  people  who  they  talk  to  (and/or  how  much 
they  talk  to  others)  can  yield  data  which  are  sufficiently  accurate  for 
further  manipulation.  We  would  consider  such  an  interpretation  unpro¬ 
ductive  for  the  following  reasons.  First,  consider  that  the  minimum 
value  of  ACCT0P1,  over  any  window,  is  0.32.  This  simply  means  that, 
for  the  most  accurate  window  (in  this  case  1<I  one,  width  two),  on 


32%  of  the  occasions  informants  could  not  name  correctly  the  person  with 
whom  they  communicated  with  most  frequently.  Second,  to  repeat,  there 
is  no  way  for  a  researcher  taking  data  to  choose  the  "most  accurate  window" 
for  any  given  study.  Even  if  this  were  possible,  the  researcher  would 
have  to  settle  for  less  Inaccuracy  of  one  kind  at  the  cost  of  getting 
higher  inaccuracy  of  other  kinds.  Finally,  the  most  accurate  source 
of  data  is  on  windows  with  a  lag  of  a  few  minutes.  But  researchers 
collecting  data  in  the  field  would  themselves  have  been  present  during 
these  "more  accurate"  windows.  Thus,  at  best,  they  would  have  been 
able  to  observe  communication  directly  (in  which  case,  why  ask  for 
data  from  informants?);  and,  at  worst,  their  presence  will  have 
modified  the  communications  being  measured. 

VI.  -What  Else  Acccounts  For  Inaccuracy? 

We  have  seen  that  the  dependence  of  accuracy  on  the  lag  and  width 
of  time  windows  is  not  strong.  Clearly,  other  variables  are  contributing 
to  Informant  inaccuracy.  Some  of  these  variables  are  presumably 
functions  of  the  personal  history  and  qualities  of  each  informant. 

Some  informants  have  better  memories  than  others,  for  example;  some  use 
EXES  more  frequently  than  others;  and  so  on.  Some  variables  may  be  a 
function  of  the  particular  window  under  consideration.  Perhaps  the 
window  involved  a  lot  (or  very  little)  message  traffic;  perhaps  the 
informant  was  in  a  hurry  when  being  interviewed;  or  perhaps  the  informant's 
first  few  Interviews  wera  less  accurate  than  later  ones. 

During  the  background  interview,  we  asked  each  informant,  how 
well,  on  a  scale  of  1-7,  he  or  she  could  remember  each  of  the  following: 
sip  codes,  phone  nusbers,  names,  faces,  dates,  lyrics,  and  birthdays. 


Perhaps  an  informant's  self-evaluation  of  memory  is  related  to  his  or 
her  accuracy  in  recalling  connunication.  At  the  end  of  each  window 
interview,  informants  also  provided  estimates  of  their  confidence,  on 
a  scale  of  1-7,  about  their  recall  of  the  following:  list  of  conmunicants, 
number  of  messages  sent,  number  of  messages  received,  number  of  lines 
sent,  and  number  of  lines  received.  Both  the  memory  and  the  confidence 
measures  averaged  around  4,  as  might  be  expected.  Since  these  variables 
are  too  highly  intercorrelated  to  use  separately  in  regressions,  we 
factored  each  set.  This  produced  three  memory  variables:  the  average  of 
names  and  ’’faces;  birthdays;  snd  phone  numbers.  A  similar  factoring 
on  confidence  measures  reduced  them  to  two:  confidence  in  the  list  of 
communicants ;  and  the  average  of  the  other  four. 

Surprisingly,  the  memory  variables  were  almost  uncorrelated  with 
the  five  inaccuracy  measures;  however  the  two  confidence  measures  were 
reasonably  correlated  (r  •  -  0.2  to  -0.3)  with  inaccuracy.  Of  course, 
the  lack  of  correlation  of  memory  and  Inaccuracy  could  be  produced  by 
other,  more  subtle  cross-correlations.  Accordingly,  a  large  nimiber  of 
variables  was  entered  in  a  multiple  correlational  search  to  find  the 
predictors  of  accuracy.  In  the  search,  at  various  levels  of  Inclusion, 
were:  sex  and  age  of  informant;  number  of  people  recalled  ("to,"  "from," 
and  "both”);  time  to  take  the  window;  total  time  ever  spent  on  EIES 
by  the  informant;  lag,  width;  number  of  people  comunicated  with  (again 
for  the  three  categories);  the  three  memory  variables;  the  two  confidence 
variables;  the  number  of  times  "feedback"  had  been  used  by  an  Informant  to 
check  previous  accuracy;  and  the  order  of  presentation  of  the  window. 

Little  variance  was  accounted  for,  even  by  such  a  list  of  variables. 
Eighteen  percent  of  the  variance  of  ACCT  was  accounted  for,  mainly  by 
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number  of  conmunlcants  "to"  (recalled  and  actual),  and  both  confidence 
measures.  Only  151  was  accounted  for  ACCF,  by  number  "from"  (recalled  and 
actual),  lag,  and  confidence  in  messages  and  lines.  ACC2  was  best  accounted 
for  (37Z) ,  by  number  of  recalled  communicants  and  confidence  in  that  list. 
ACCT0P1  had  16Z  accounted  for,  by  total  time  ever  spent  on  E1ES  and  con¬ 
fidence  in  list  of  coenunlcants;  ACC20  had  22Z  accounted  for,  by  number 
of  actual  comnunicanta  and  confidence  in  messages  and  lines. 

An  extra  attempt  was  made  by  inventing  such  variables  as  effort 
(time  taken  during  window  per  communicant  recalled),  and  activity 
during  window  (number  recalled  per  day  of  width).  Again,  logical  and 
empirical  transformations  of  the  data  were  made  to  improve  the  fit. 

The  conclusions  of  this  section  etill  hold.  In  short,  everything  we 
have  measured  seems  to  he  related  to  Inaccuracy  in  a  reasonable  way.  The 
problem  is  that  nothing  seems  to  matter  very  much. 

VII.  The  Special  Case  of  No  Communication 

A  special  case  of  these  calculations  occurs  when  NR  or  NA  are  zero 
(l.e.,when  an  informant  claims  he  spoke  to  no-one  or  when  she  actually 
spoke  to  no-one) .  This  case  automatically  removed  many  inaccuracy  measures 
frqm  previous  consideration  as  they  could  not  be  defined. 

On  29Z  of  occasions,  in  fact,  an  Informant  had  no  actual  communication 
during  the  window  under  consideration.  And  on  28Z  of  occasions  an 
informant  recalled  coesaunlcatlng  with  no-one.  If  these  two  sets  of 
occasions  completely  overlapped,  the  informants  would  always  be  accurate 
when  they  claimed  not  to  speak  to  anyone. 

The  overlap  is,  of  course,  imperfect.  On  41Z  of  those  occasions  when 
an  Informant  recalled  having  no  communication,  he  or  she  did  in  fact  have 
comauinicat ion;  and  on  those  occasions  she  or  he  communicated  with  4.8 
different  people.  Similarly,  on  19Z  of  occasions  when  informants 


actually  had  no  cosmainications  during  a  particular  window,  they  claimed, 
on  average,  to  have  conuunicated  with  2.1  different  people.  Consistently, 
in  all  our  work,  we  have  found  that  errors  of  omission  are  more  severe 
than  those  of  commission. 

Most  of  these  figures  are  well-predicted**  by  the  width  (but  not 
the  lag)  of  the  window  under  consideration.  Both  the  percentage-  of  times 
a  mistake  occurs,  and  the  number  of  omitted  or  coraltted  communicants, 
increase  strongly  with  width,  with  correlations  of  the  order  of  0.7  to 
0.8.  Only  the  mean  number  of  commissions  (given  a  comlsaion  occurred) 
is  weakly  described  by  width  (r  •  0.27**).  Hence,  the  longer  the  time 
over  which  informants  recall  their  interaction,  the  more  errors  of 
omission  or  commission  are  made  by  those  informants. 

VIII.  What  Is  The  Best  We  Can  Do? 

It  is  already  clear,  both  from  the  preceding  sections  and  from  AI-IV, 
that  data  from  informants  about  their  cosraunications,  over  any  time  period, 
are  unreliable.  Given  this,  are  there  any  positive  statements  which  could 
be  ude?  This  and  the  next  two  sections  are  attempts  to  find  specific 
rules  for  treating  the  data  so  as  to  yield  reliable  results.  This  section 
examines  whether  one  can  predict  the  list  of  people  communicated  with,  given 
only  informants'  recall. 

The  situation  is  difficult,  as  Table  4  demonstrates.  One  might 
arguably  be  able  to  find  some  rules  to  predict  the  0.63  people  not 
cosaninlcated  with  but  recalled;  but  it  is  unclear  how  to  predict  who  ths 
5.1  people  are  who  are  not  recalled  but  were  communicated  with.  (Ths 
entry  in  the  lower  right-hand  corner  depends  on  the  sire  of  population 
involved  and  is  not  easy  to  define;  the  number  Involved  is  obviously  largo, 
but  defining  the  entries  hero  to  be  "secure to"  hardly  helps  the 
situation.) 


Let  us  first  seek  to  predict  the  numbers  in  Table  4.  (The  equivalent 
tables  for  "to"  and  "from"  are  equally  predictable,  and  omitted  here  as 
are  "last-on"  cases,  which  are  much  more  scattered.)  We  are  given  only  NR 
(number  recalled)  plus  the  information  detailed  in  Section  VI.  Now  NA  can 
be  predicted  to  64Z**  of  its  variance,  overwhelmingly  by  a  linear  function 
of  NR,  whose  coefficient  is  about  1.44;  the  underestimation  is  typical  of 
all  our  data  sets.  Since 
NR  -  a  +•  b 
la  known,  and 

NA  -  a  +  c 

is  well  predicted,  only  one  more  quantity  needs  to  be  predicted  to  define 
a,  b  and  c.  In  fact  T1  (l.e.,  b)  and  T2  (i.e.,  c)  can  also  be  predicted, 
the  former  to  36Z** — again  a  linear  function  of  NR — and  the  latter  to  521**, 
by  NR,'  and  total  time  ever  spent  on  EIES>  As  a  result,  a,  b  and  c  are  all 
predicted  by  linear  functions  of  NR,  with  coefficients  0.68,  0.32  and  0.77 
respectively. 

Predictability  of  numbers  of  people  in  various  categories,  of  course. 
Is  of  little  help  to  a  researcher  concerned  with  mapping  the  conmiunication 
structure  of  a  group.  The  recorder  needs  to  know  which  people  fall  into 
the  four  categories.  Is  there  some  rule  which  would  enable  the  researcher 
to  obtain  recall  data  from  an  informant  and  then  to  select  some  of  those 
conmmicants  and  be  sure  they  were  in  category  (a),  i.e.,  were  actually 
coemunlcated  with?  We  are  not  here  requiring  a  rule  which  specifies  the 
entire  of  category  (a);  merely  a  reliable  subset — no  member  of  category 
(b)  if  to  be  allowed.  Civen  the  high  level  of  Inaccuracy  involved,  this 
la  clearly  the  best  one  might  hope  for. 


Behavior 


recall 


communicated 

not  communicated 

with 

with 

cooaunicated 

2.14 

0.63 

with 

(a) 

<b) 

not 

communicated 

with 

5.10 

(c) 

TT 

TABLE  4 

Accuracy  contingency  table 


The  entry  In  each  box  la  the  mean  number  of  communicants  for  that 
box:  e.g.  5.1  people  were  communicated  with  but  not  recalled.  The 
lower  right  entry  cannot  eaaily  be  defined. 
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There  ere  two  ways  this  might  be  achieved.  Obviously  the  rule  must 
involve  selecting  those  people  an  informant  reported  communicating  with 
most  frequently.  The  chances  are  slim  at  best  that  someone  would 
be  reported  as  spoken  to  only  rarely  and  yet  be  consistently  In  category 
(a).  The  simplest  rulet  then,  is  to  define  some  (small)  integer  n  and 

specify  that  the  people  reported  as  spoken  to  first,  second . nth 

most  often  are  actually  spoken  to.  Recall  that  there  may  be  other  actual 
communicants ;  this  rule  would  not  rack  to  find  them. 

Let  us  define  an  inaccurate  "score"  which  is  rather  similar  to  T2P. 

For  a  given  n,  the  score  is  the  ratio  (undefined  when  both  HR  and  NA  are 
aero). 

number  of  those  in  category  (b)  predicted  by  the  rule 
score  -  min(n,  number  of  reported  communicants} 

The  rule  is  accurate  when  the  score  la  aero,  and  totally  inaccurate  when 

the  score  is  unity.  Whan  n  eaceeds  the  number  of  reported  conmunlcants, 

all  consunicants  are  selected  by  the  rule. 

Somewhat  surprisingly  the  score  almost  always  decreased  mono ton lea lly 

with  n.  A  peak  in  inaccuracy  usually  occurred  for  very  low  n — suggesting 
that  the  frequent  restriction  by  sociometricians  to  an  informant's  "top  3"' 
choices  may  be  dangerous.  In  fact  the  median  value  for  the  most  inaccu¬ 
rate  cutoff  n  for  this  rule  turns  out  to  be  n  -  2,  where  the  score  takes 
an  average  value  of  791.  In  other  words,  79*  of  this  people 
selected  by  "use  the  top  2  recalled  coonunlcants"  are  not  spoken  to! 

Because  of  the  improvement  in  accuracy  by  Increasing  n,  the  optimal 
rule  Involves  selecting  all  recalled  comsunlcants  as  being  actual 
commicants.  However,  this  still  yields  19*  Inaccuracy.  Thus, 


although  this  is  the  most  accurate  version  of  the  rule,  it  is  unreliable 
once  in  every  five  occasions,  and  clearly  unacceptable. 

The  second  possible  method  would  be  to  modify  the  cutoff  used.  It 
might  be  argued  that  only  those  individuals  perceived  as  "conmunicated 
with  a  great  deal"  should  be  Included  by  the  rule.  In  other  words,  the 
inclusion  rule  ceases  to  be  relative  ("take  tha  top  5,"  etc.)  and  becomes 
absolute  ("choose  all  those  recalled  as  having  more  than  x  communication" 
for  some  x). 

We  chose  to  make  the  cutoff  point  he  a  function  of  informant.  Each 
informant's  total  communication  was  scanned,  and  the  maximum  number  of 
messages  and  lines  was  recorded  over  all  windows  and  all  cosmunlcants. 

The  selection  rule  then  became  "choose  a  recalled  communicant  only  if  the 
amount  of  recalled  communication  (messages  or  lines}  exceeds  xZ  of  that 
Informant's  maximum  coanunicatlon."  What  value  should  x  take  in  order  to 
achieve  totally  reliable  data? 

Unfortunately,  x  needs  to  be  100  percent  (and  the  data  are  not 
reliable  even  then).  Figure  6  shows  histograms  of  the  required  cutoffs, 
over  the  Informants.  The  largest  peaks  are  in  the  91-100  percent  baud, 
indicating  that  for  at  least  twelve  Informants  any  rule  of  this  type  would 
be  spurious.  There  is  a  cutoff  of  10  percent  or  less  for  ouly  6  inform¬ 
ants.  In  general,  the  scatter  in  Figure  6  is  too  great  to  produce  a 
reliable  rule. 

Hor  la  the  situation  improved  by  considering  the  numerical  values  of 
the  cutoffs  rather  than  their  percentage  values.  Eighty  percent  of  these 
cutoffs  lie  in  the  lowest  10  percent  of  the  message  or  lines  traffic.  For 
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example,  the  cutoff  for  41  informant*  involved  fewer  than  10  message*  for 
total  reliability:  for  8  Informants  (16  percent  of  those  for  whoa  the 
calculation  could  be  performed)  the  cutoff  was  two  messages  or  less  for 
total  reliability. 

We  are  forced  to  conclude  that  there  is  no  reliable  way  to  select  a 
subset  of  those  recalled  who  are  actually  coosaunicated  with.  If  we  select 
only  those  communicants  with  reported  comaun lea t ion,  more  than  90  percent 
of  the  maximum  ever  achieved — a  very  stringent  criterion — no  less  than  25 
percenc  of  the  time  the  data  are  wrong. 
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IX.  Global  Statistics 

Many  of  the  results  presented  so  far  have  been  based  on  dyadic 
■easures;  that  is,  two  people  are  involved:  an  Informant ,  and  a  communicant. 
In  our  previous  papers  (AI,  III,  IV)  we  analyzed  higher  level  data. 

Including  triads  and  n-tads  or  "cliques.”  The  data  were  progressively 
aore  Inaccurate  as  the  level  of  structure  became  more  complex.  Because 
data  in  this  paper  were  taken  froa  a  small  subset  of  a  closed  group, 
repeating  the  analyses  at  the  triadic  or  clique  levels  would  be  fruitless. 
However,  this  does  not  Invalidate  the  less  stringent  task  of  searching 
for  siailarltles  in  the  global  structures  of  recall  and  behavioral  data. 

This  section  investigates  "net  popularity,"  and  the  structural  equivalence 
of  the  two  data  sets. 

a)  Popularity 

Interest  in  locating  the  most  popular  persona  in  a  group  goes  back 
to  the  beginnings  of  sociometry.  Most  groups  appear  to  have  a  small  subset 
of  their  members  who  are  coamunicated  with  significantly  more  often  than 
others  in  the  group.  Although  informants'  recall  is  poor  at  the  dyadic 
level,  do  they  nonetheless  "know"  who  the  popular  members  are  in  the  group? 
He  tested  this  in  two  ways. 

In  the  first  method,  we  estimated  the  actual  popularity  of  each 
member  of  EIES,  by  adding  up  all  the  messages/lines  ever  sent,  by  the 
informants  in  any  of  the  windows  to  that  member  of  EIES.  For  these  purposes, 
there  are  364  members  of  EIES.  Due  to  temporal  overlap  of  some  of  the 
windows,  the  results  may  be  slightly,  but  unavoidably,  biased.  He  ranked 
the  top  20  of  the  364  in  order  of  communication,  by  both  messages  and  lines. 
A  similar  procedure  was  carried  out  for  recall  data,  and  the  two  sets  of 
ranks  were  compared.  Here  the  results  are  rather  encouraging.  The 
person  in  EIES  who  is  communicated  with  most  (messages  or  lines)  is  the 
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fourth  most  popular  person  In  the  recall  data.  Nonetheless,  the  four  most 
popular  people  in  the  behavioral  data  are  the  same  as  the  four  most  popular 
people  in  the  recall  data,  but  in  wrong  order.  (The  consistent  under¬ 
estimation  continues;  both  lines  and  messages  are  underestimated  by  about 
50Z.)  Even  the  top  10  seem  reasonable:  only  one  in  two  of  the  behavioral 
top  10  are  omitted  in  the  recall  data. 

The  same  results  held  when  we  restricted  our  attention  to  a  subset  of 
the  data.  Instead  of  recording  all  messages  from  an  informant  to  the  entire 
population  of  EIES,  we  recorded  only  the  communication  (actual  and  reported) 
for  each  of  our  informants  to  the  n  persons  on  EIES  with  whom  each  informant 

communicated  with  first,  second . nth  most  often  during  a  given  window. 

Here  n  takes  the  values  1,3,  or  5.  Precisely  similar  results  are  found. 

In  other  words.  Informants  may  not  know  who  they  speak  to  the  most; 
but  they  appear  to  know,  in  general,  who  is  most  spoken  to. 

In-  the  second  method,  we  examined  the  popularity  of  our  Informants 
rather  than  of  EIES  in  general.  This  time  we  counted  incoming  messages  from 
all  persons  on  EIES  to  our  informants  (again,  both  messages/lines  and 
behavior/recall  data).  We  ranked  the  informants  in  order  of  popularity, 
and  we  obtained  results  similar  to  those  obtained  in  the  first  method. 

The  'first  three  informants  (ranked  by  messages)  are  the  same  for  both 
behavior  and  recall,  though  in  the  wrong  order.  The  most  popular  person 
(ranked  by  lines)  was  the  same  for  both  behavior  and  recall.  Although 
the  second  most  popular  person  in  behavior  was  valued  sixth  in  recall 
(again  for  lines)  the  top  six  were  the  same  in  both  cases. 

Similar  results  are  found  by  restricting  attention  to  the  top 
1,3  or  S  communicants,  although  the  resulting  most-popular  person  is 
never  the  sane  for  recall  and  behavior. 
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b)  Structural  equivalence 

Although  Informants  are  inaccurately  recalling  their  communication 
at  many  levels,  we  showed  above  that  they  have  an  accurate  "feel"  for 
the  popular  members  of  the  group.  Do  they  in  fact  recall  accurately  the 
relative  positions  of  themselves  and  others  in  the  group?  In  other  words, 
how  equivalent  are  the  structures  present  in  behavioral  and  recall  data?^ 
(Again,  the  saiall  subset  of  the  group  comprising  the  Informants  precludes 
other  analyses  such  as  centrality  and  the  like.) 

The  strong  inaccuracy  at  the  dyadic  level  suggested  that  any 
comparison  between  behavior  and  recall  at  all  but  the  simplest  level 
would  probably  fail.  Hence  we  simplified  both  behavioral  and  recall 
data  to  a  (57x384)  matrix  m^k  where 
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if  1  ever,  communicated  with  k 

/ 

otherwise 


He  then  defined  three  (57x57)  Mtrlces  on  the  subset  of  our  Informants. 

The  first  is  a  simple  symmetric  distance  measure  d  where 
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where  the  sum  is  taken  over  all  k  in  the  entire  group,  and  the  zero 
diagonal  value  is  for  later  convenience.  Thus  d^  is  small  when  i  and  j 
are  "similar”  and  large  when  1  and  j  are  "dissimilar." 

The  second  and  third  matrices  are  'bubstitutabllicy"  measures 
s  and  t  Both  measure  how  well  i  and  J  can  substitute  for  each  other 
in  terms  of  their  patterns  of  communication.  The  s^  matrix  is  symmetric, 
by  dividing  the  intersection  of  i's  and  j's  communication  by  the  union: 
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if  denominator  i  0 


if  denominator  »  0 


where  the  1  indicates  perfect  substitutability  if  i  has  no  communication. 
The  t.. matrix  is  asymmetric^  by  normalizing  by  l's  total  communication! 
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if  denominator  t  0 


If  denominator  ”  0 


These  last  two  matrices  increase  with  i's  similarity  to  j ;  the  first,  d.. 

lj » 

decreases  with  i's  similarity  to  j.  All  have  zero  diagonal  values. 

Ve  may  now  compare  behavioral  and  recall  versions  of  each  matrix, 

by  the  T  measure  Introduced  by  Katz  and  Powell  (1951)  and  extended  by 

Hubert  and  Baker  (1978).  T  is  no  more  than  the  correlation  coefficient 

between  the  behavioral  and  recall  entries  of  d  s  or  t  .  Its  significance 
r.  ij»  IJ  ij 

can  then  be  tested  by  Mantel's  strategy  (see  Hubert  and  Baker).  This 
examines  whether  relabeling  the  57  Informants  in  the  recall  matrix  would 
produce  a  significantly  better  or  worse  fit  to  the  un-relabelYed  behavioral 
matrix.  Hubert  and  Baker  provide  an  approximate  Z-score  for  T  ((mean- 
expected  mean)  *  standard  deviation]  together  with  a  pessimistic  estimate 
of  significance  level,  Q.  The  Z-score  of  course  yields  an  optimistic 
level;  above  1.96  the  results  are  significant.  Monte  Carlo  simulations  would 
be  necessary  if  the  results  showed  conflicting  significance  estimates. 
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The  results  for  the  three  matrices  are: 


dlJ  !  r  '  0,64;  2  •  <5  "  3*7* 

si;)  :  r  -  0.30;  2  -  9.1;  Q  -  1.22 

t«  1  r  -  °*39J  2  ’  41‘  «  ■  5‘M 

In  all  cases  the  degree  of  structural  agreement  between  behavior  and  recall 

Is  at  least  significant*,  with  very  high  2  scores.  So  the  behavioral 
and  recall  matrices  possess  similar  signals.  However,  the  detailed 
agreement  is  rather  poor:  the  variance  accounted  for  in  the  behavioral 
data  by  the  recall  data  is  412,  92,  and  152  respectively.  In  other  words, 
one  daCa  3et  could  not  be  used  as  a  proxy  for  the  other. 

In  summary,  then,  at  a  global  level  there  is  reasonable  agreement 
between  recall  and  behavior.  Recall  data  yields  a  list  of  "popular" 
people  which  is  very  similar  to  the  list  produced  by  behavioral  data. 
Similarity  and  dissimilarity  measures  between  informants  show  considerable 
correlation  between  behavior  and  recall  data,  but  recall  accounts  for  in¬ 
sufficient  variance  in  behavioral  data  for  it  to  be  used  as  any  hind  of 
predictor. 


X.  Can  We  Calibrate  the  Recall  Data? 

Implicit  In  most  empirical  studies  Is  the  concept  of  cost-effectiveness. 
How  much  will  It  cost  to  collect  good  data,  and  will  It  be  worth  it?  The 
two  specific  extreme  choices  in  our  case  are  (a)  use  inexpensive  measures 
of  message  traffic,  such  as  recalled  messages  to  (RMT)  and  recalled  messages 
from  (RMF)  some  person,  and  collect  large  amounts  of  data;  or  (b)  use  costly, 
direct  observational  measures  of  message  traffic,  in  our  case  the  actual 
number  of  messages  to  (AMT)  and  from  (AMF)  some  person.  This  is  only 
feasible  on  a  small  dataset.  Typical  research  projects  in  network  analysis 
use  economical  but  inaccurate  measures.  In  this  section  we  suggest  and 
demonstrate  a  technology  that  may  help  improve  the  accuracy  of  the  cheap 
measure  for  a  few  extra  dollars. 

In  the  data  sets  we  work  with,  we  purposely  record  the  expensive 
measures  and  the  inexpensive  measures  for  all  the  cases.  (In  fact,  we  chose 
our  research  population  because  the  observational  measures,  usually  so 
expensive,  are  cheap.)  One  simple  and  general  measure  of  the  accuracy  of 
the  cheap  measure  is  the  mean  square  error,  in  this  case 
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To  "improve"  RKT,  we  adapt  what  in  sampling  theory  is  called  Regression 
Estimation.  Suppose  that  in  a  large  data  set  some  concept  is  measured 
inaccurately  (the  usual  case).  Regression  estimation  proceeds  as  follows: 

1.  Choose  a  small,  simple  random  sample  of  cases  from  the  data  set. 

2.  Measure  (again)  each  case  in  the  sample  using  the  expensive, 
accurate  measure  (AMT  or  AMF  in  our  case). 

3.  Using  any  and  all  cheap  measures  and  statistical  tricks,  develop  a 
prediction  equation  for  the  accurate  measures.  (In  pur  case,  AMT 
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is  a  function  of  RMT,  RMT2 ,  number  of  people  recalled,  effort, 
lag,  width,  perceived  activity,  experience  using  EIES,  and  several 
Interactions  of  similar  variables.)  Since  this  is  a  simple  random 
sample  of  the  data  set,  the  prediction  equations  should  generalize  to 
the  data  set. 

4.  The  independent  variables  in  the  prediction  equation  are  all  cheap 
(by  our  design)  and  have  been  measured  for  all  cases  in  the  data  set. 
Call  the  value  of  the  predicted  valued  for  each  case  in  the  data 
set  "corrected  RMT,"  or  CRKT.  In  other  words,  RMT  is  corrected  for 
bias,  and  various  individual  characteristics  by  using  the  relation¬ 
ship  between  AMT  and  RMT,  effort,  etc.  in  the  sample. 

Statistical  theory  that  the  connected  RMT  in  the  entire  data  set  will 
be  a  better  proxy  for  AMT  than  uncorrected  RMT.  In  our  data  set  we  can 
assess  this  claim  directly,  since  we  know  AMT.  The  accuracy  of  CRMT, 

Is  therefore 
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The  relative  accuracy  of  CRMT  and  RMT  in  measuring  AMT  is,  for  our  data, 

w-  • 

The  same  result  for  CRM?  and  RMT  is 


MSK(CRMF) 
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The  corrected  RMT  and  RMF  are  roughly  20Z  better  than  the  raw 
measures.  Mills  this  might  encourage  some,  It  is  not  really  as  good  as 
it  might  be.  Being  20Z  better  than  awful  is  not  good;  it  is  medium  bad. 
Still,  If  the  project  must  go  on,  there  are  two  alternatives.  The 
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researcher  must  choose  to  a)  measure  Nj  cases  at  C]  dollars  per  case  or 
b)  measure  U2  cases  at  cj  dollars  per  case  and  n2  cases  at  C2  dollars 
per  case,  where  c2/cj  is  large  and  n2/N2  is  small.  Nj  and  N2  are  about 
equal.  For  example,  instead  of  1000  cases  at  $1  per  case,  one  could  collect 
7S0  cases  at  $1  per  case  and  SO  cases  at  $5  per  case.  The  total  cost 
is  the  same.  But  if  the  SO  case  sample  can  be  used  to  Improve  the  accuracy 
of  the  data  set  by  a  factor  of  more  than  /1000//7SO  •  1.15,  then. the  final 
results  from  plan  (b)  should  be  much  more  accurate  in  the  long  run. 

Calibration  of  the  recall  data  in  this  paper  unfortunately  yielded  abysmal 
results,  but  this  may  be  because  we  failed  to  put  the  right  quantities  into 
the  regressions.  Ue  will  have  more  to  say  about  the  implications  of  this 
in  the  conclusions. 

XI.  Conclusions 

In  an  effort  to  determine  how  much  lag  and  width  of  a  time  window 
affected  communication  recall,  we  designed  a  totally  automated  experiment. 

T*1®  experiment  took  advantage  of  a  new  coimunications  medium  (computer 
conferencing)  which  enabled  us  to  monitor  automatically  all  interactions 
Involving  a  subset  of  the  computer  network.  In  previous  experiments 
we  had  found  little  which  accounted  for  the  gross  Inaccuracy  in  human 
recall  of  communication.  We  believed  that  the  concepts  of  lag  and  width 
might  prove  helpful. 

Although  lag  and  width  account  for  some  of  the  variation  in  accuracy 
(small  lags  and  widths  tended  to  be  more  accurate  than  large  ones),  the 
amount  of  variance  accounted  for  is  small  (typically  about  10X). 

Consideration  of  a  wide  variety  of  other  variables  still  failed  to  account 
for  moat  of  the  variation  in  accuracy  (never  more  than  37X,  and  usually 
less  titan  20Z). 
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Nor  are  people  more  accurate  when  they  recalled  cotmunicating  with 
nobody.  On  41Z  of  such  occasions,  connunication  had  taken  place,  with 
4.8  different  people,  on  average. 

Only  one  positive  statement  can  be  made  about  accuracy  from  our 
results.  Although  individual  people  do  not  know  with  whom  they  commun¬ 
icate,  people  en  masse  seem  to  know  certain  broad  facts  about  the 
communication  pattern.  Specifically,  if  we  examine  the  aggregate  of 
what  everybody  says  about  their  communications  with  everybody,  the 
resulting  "most-frequently-communicated-vlth  members  of  the  group  turn 
out  to  be  correct.  That  is,  the  list  of  the  top  six  most  “popular" 
people  is  the  same  for  both  recall  and  behavioral  data. 

All  other  findings  were  negative.  It  is  impossible,  for  example, 
to  produce  an  accurate  list  of  those  with  whom  an  informant  has  conmunication, 
given  his  or  her  recalled  list  together  with  estimates  of  amount  or 
frequency.  It  is  Impossible  to  predict  who  the  (on  average)  five  people 
are  that  an  informant  forgot  to  mention  that  she  or  he  had  had  commun¬ 
ication  with.  It  is  impossible  to  predict  the  people  an  Informant  claimed 
to  communicate  with  but  did  not.  And,  finally,  although  the  structure 
of  recall  and  behavioral  data  are  correlated,  the  scatter  remains 
far  too  high  to  use  one  as  a  proxy  for  the  other. 

XII.  Discussion 

Vie  began  this  series  of  papers  in  1973  because  we  distrusted 
conclusions  drawn  by  network  reaearchers  (Including  ouraelves)  about  the 
structure  of  communications  in  human  groups.  We  had  no  reas  »  to  distrust 
the  motives  of  our  (or  anyone  elae's)  Informants.  As  far  as  we  know,  if  a 
researcher  Inquires  about  an  Informant's  communications ,  the  data  obtained 
are  an  accurate  (l.e.,  honest)  description  of  how  the  informant  believes 


he  or  she  communicates.  We  continue  to  assume  that  the  amount,  frequency, 
and  persona  Involved  all  accurately  represent  the  Informant's  view  of  his 
or  her  network.  However,  one  consistent  and  unavoidable  conclusion  has 
emerged  from  our  studies  of  informant  accuracy  in  network  data:  what 
people  say,  despite  their  presumed  good  intentions,  bears  no  useful  resemblance 
to  their  behavior. 

This  immediately  makes  suspect  ell  forms  of  the  instruments  "what 

do  you _ ?"  and  "who  do  you _ ?"  It  may  very  well  be  that  peasant 

farmers  can  report  accurately  how  many  bushels  of  wheat  they  harvested  last 
year,  or  it  may  not  be.  It  appears  that  people's  reports  of  their  voting 
behavior  are  accurate,  if  the  data  are  gathered  immediately .  (What 
proportion  of  Che  population  today  would  claim  to  have  voted  for  Richard 
Nixon  in  1972?).  On  the  other  hand,  asking  people  about  their  consumption 
of  goods  and  services  produces  appalling  results.  As  far  as  accuracy  of 
recall  about  coasiuaicaCion  is  concerned,  the  only  thing  people  have  ever 
recalled  accurately  in  our  experiments  is  who  the  most  "popular”  people 
are  in  their  group.  (By  "popular"  we  mean  who  in  the  group  is  coonunlcated 
with  Che  most.)  Even  then,  informants  get  the  most  popular  Individual 
wrong  most  of  the  time. 

,  We  feel  that  it  is  vital  In  any  field  to  have  accurate  (not  just 
reliable)  data.  It  is  virtually  impossible  to  develop  a  theory  for  any 
process  unless  one  can  obtain  accurate  data  about  that  process.  This  must 
be  just  as  true  for  human  coanunl cat  ions  (and  interactions  in  general)  as 
for  black  holes,  DNA  molecules,  or  the  movement  of  tectonic  plates.  Still, 
it  is  obvious  that  in  research  on  human  beings  in  natural  settings,  acquiring 
full,  accurate  data  on  their  behavior  is  nearly  Impossible.  We  have  been 
able  to  achieve  this  only  because  we  selected  groups  whose  behavior  could 
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be  monitored,  and  not  because  of  any  interest  we  might  have  had  in  the 
groups  themselves.  Our  interest  has  been  exclusively  methodological. 

There  are  at  least  two  ways  to  treat  the  dilemma  of  needing  accurate 
data  and  not  having  any.  Both  ways  are  important  and  should  be  implemented. 

The  first  requires  the  collection  of  behavioral  data  in  natural  settings 
on  child  rearing  practices,  alcohol  consumption,  leisure  activities, 
health  care  activities  —  in  short,  on  everything  in  which  social  scientists 
are  interested,  and  for  which  they  normally  rely  on  recall  data.  It  is 
not  necessary  (we  hope)  to  collect  full,  matched  sets  of  recall  and  behavioral 
data  such  as  we  have  done  in  our  program  of  methodological  studies.  It 
should  be  sufficient  to  obtain,  for  each  behavior  being  studied,  a  sample 
from  the  population,  in  order  to  calibrate  the  data  obtained  from  informants' 
recall.  It  logically  follows  that  we  should  not  pretend  to  study 
auantltatlvaly  things  that  can  not  be  measured  by  direct  observation,  or 
at  least  by  using  accurate  and  calibrated  (if  Indirect)  instruments. 

The  second  way  is  to  seek  other  quantities  .  hitherto  unmeasured,  which 
may  be  accounting  for  inaccurate  recall.  Quantities  which  come  to  mind  are 
motivation,  content,  importance,  meaning,  ecological  conditions,  population 
density,  norms,  detail  of  the  interview  procedure,  and  so  on.  These 
qualities  need  to  be  defined,  then  collected  —  accurately!  —  and  finally 
checked  to  see  if  they  are  related  to,  or  predict,  the  behavior  which  we 
are  trying  to  study.  We  cannot  simply  "blame  ”  inaccurate  data  on  these 
quantities  until  and  unless  we  have  examined  whether  this  is  the  case.  So 
far,  everything  we  have  tested  fails  to  account  for  inaccuracy.  The 
unpleasant  poaalblllty  ia  that  nothing  accounts  for  variations  In 
accuracy,  except  individual  (that  la,  random)  differences  .  .  . 
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FOOTNOTES 


1.  Burt  and  BiCtner  (1980)  have  pointed  out  that  the  clique-finders 
which  we  used  do  not  necessarily  produce  statistically  adequate 
subgroups.  We  support  their  call  for  testing  the  adequacy  of  sub¬ 
groups,  and  we  note  that  this  has  not  been  done  until  very  recently 
with  the  advent  of  algorithms  for  doing  so.  We  have  a  gnawing  sus¬ 
picion  that  this  will  only  further  Invalidate  much  of  sociometric 
and  social  network  research. 

2.  A  copy  of  the  two  page  invitation  letter,  and  full  documentation  of 
the  experiment  is  contained  in  a  technical  report,  available  from 
the  authors;  the  Office  of  Naval  Research,  Code  452,  Arlington,  Va., 
22217;  or  NT1S.  The  report  is  called  "An  experiment  on  the  degradation 
of  accuracy  in  human  recall  of  communications,"  (see  Bernard,  Klllworth, 
and  Sailer  1979)  and  contains  a  codebook  for  the  publicly  available 
tape  of  the  data  from  the  experiment.  The  tape  is  available 

from  Bernard. 

3.  The  removal  of  certain  measures  from  consideration  when,  for  example, 

NR  or  NA  is  zero,  may  appear  to  bias  the  averages  which  follow  in 
the  text.  (We  defer  consideration  of  NA  or  NR  with  values  of  zero 
until  Section  VII.)  The  averages  of  various  measures  quoted  in 

this  paper  are  biased  In  a  statistical  sense,  due  to  the  starting 
involved.  This  results  in  shift  toward  0.5  in  all  fraction-type 
measures;  high  Inaccuracy  is  decreased  by  this,  low  inaccuracy  is 
increased.  The  differences  are  numerically  small  except  at  extreme 
cases,  near  zero  or  unity,  when  they  Increase  to  about  102.  Monte 
Carlo  simulations  show  that  the  mean  of  the  unstarted  fractions  is 
unbiased  but  inefficient;  the  mean  of  the  started  fractions  is  biased 
but  more  efficient.  Opinions  were  divided  between  the  authors  as 
to  which  is  the  better  approach.  In  the  end,  it  is  probably  a 
question  of  each  researcher's  background. 

4.  We  realize  that  we  do  not  have  independent  cases,  normal  distributions, 
etc.  We  use  the  word  "significant"  to  mean  sizable,  or  notable, 

or  whatever.  The  probabilities  are  those  produced  by  the  statistical 
'  packages  and  are  Included  for  information  rather  than  statements 
about  some  population. 

5.  We  are  indebted  to  Ronald  Burt  for  discussions  leading  to  this 
investigation. 
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CAPTIONS  FOR  FIGURES  1-6 
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Figure  6. 


Contours  of  the  Keans  of  the  ACCT  (inaccuracy  "to")  measure  as 
a  function  of  lag  and  width.  Both  lag  and  width  are  expressed 
In  days,  on  a  log-log  scale  for  clarity  (l.e.  "in  the  last  n 
days"  corresponds  to  the  upright  axis).  Contours  are  every  0.05, 
labelled  every  0.1.  The  heavy  dots  indicate  the  location  of  the 
27  windows;  the  sparcity  of  data  in  the  upper  left  quadrant 
aeans  that  the  smoother  contours  there  should  be  Interpreted 
with  caution.  The  minimum  value  is  0.30  (lag-width-1);  maximum 
0.74  (lag-14,  width-1). 

Contours  of  the  means  of  the  ACCF  (inaccuracy  "froa")  measure, 
displayed  as  in  Figure  1.  Minimum  value  0.24  (lag-1,  width-2); 
maximum  0.81  (lag-30,  width-1). 

Contours  of  the  means  of  the  ACC2  (inaccuracy  "to  and  from”) 
measure,  displayed  as  in  Figure  1.  Minimum  value  0.45  (lag- 
width-l);  maximum  0.85  (lag-14,  width-1). 

Contours  of  the  meens  of  the  ACCT0P1  (inaccuracy  "top  ranked 
person")  measure,  displayed  as  in  Figure  1.  Minimum  value  0.32 
(lag-1,  wldth-2);  maximum  0.83  (lag-14,  width-1). 

Contour  of  the  means  of  the  ACC20  (Inaccuracy  "error  in  ranking 
by  110Z")  measure,  displayed  as  in  Figure  1.  Minimum  value  0.47 
(lag-1,  width-7);  maximum  0.75  (lag-14,  width-1). 

Histograms  of  the  minimum  percentage  of  total  message  or  line 
communication  required  for  accuracy.  If  an  informant  reports 
communication  with  someone  above  this  percentage  cutoff,  then 
that  person  is  In  fset  communicated  with.  Below  the  cutoff, 
this  may  not  be  true.  The  solid  bars  show  messages;  the  plain 
bars,  lines. 


